-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add incremental algorithms support #160
base: main
Are you sure you want to change the base?
Conversation
5986180
to
535c1e4
Compare
/azp run CI |
Azure Pipelines failed to run 1 pipeline(s). |
/azp run ml-benchmarks |
No pipelines are associated with this pull request. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -239,6 +239,7 @@ def get_result_tables_as_df( | |||
bench_cases = pd.DataFrame( | |||
[flatten_dict(bench_case) for bench_case in results["bench_cases"]] | |||
) | |||
bench_cases = bench_cases.map(lambda x: str(x) if not isinstance(x, Hashable) else x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is non-hashable object you are trying to convert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic statistics result_options parameter is a list
def create_online_function( | ||
estimator_instance, method_instance, data_args, num_batches, batch_size | ||
): | ||
|
||
if "y" in list(inspect.signature(method_instance).parameters): | ||
|
||
def ndarray_function(x, y): | ||
for i in range(n_batches): | ||
for i in range(num_batches): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave old simple logic with batch_size
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change? It overcomplicates data slicing with extra parameter checks and calculations, also, it is more common to know batch size before partial_fit
call in real world cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change?
Adding new feature which can be useful.
It overcomplicates data slicing with extra parameter checks and calculations
It costs nothing. And doing calculations in the code is better than doing them in calculator before running benchmarks.
it is more common to know batch size before
partial_fit
call in real world cases.
But while doing benchmarking it is not less common (I'd say even more) when the user wants to specify exact number of partial_fit calls.
if method == "partial_fit": | ||
num_batches = get_bench_case_value(bench_case, "data:num_batches") | ||
batch_size = get_bench_case_value(bench_case, "data:batch_size") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of separate branch for partial_fit
, extend mechanism of online_inference_mode
to partial fitting too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you provide the exact link to implementation of this mechanism? i was not able to find the usage of this parameter, just see its setting in the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, online_inference_mode
was removed as unnecessary before merge of refactor branch. This mode is enabled by batch_size != None
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can split batch size into two for training and inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually,
online_inference_mode
was removed as unnecessary before merge of refactor branch.
what should I extend then?
configs/incremental.json
Outdated
"library": "sklearnex", | ||
"num_batches": {"training": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"library": "sklearnex", | |
"num_batches": {"training": 2} | |
"library": "sklearnex" |
configs/incremental.json
Outdated
"library": "sklearnex", | ||
"num_batches": {"training": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"library": "sklearnex", | |
"num_batches": {"training": 2} | |
"library": "sklearnex" |
configs/incremental.json
Outdated
"library": "sklearnex.preview", | ||
"num_batches": {"training": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"library": "sklearnex.preview", | |
"num_batches": {"training": 2} | |
"library": "sklearnex.preview" |
Description